Jammu and Kashmir
He Leaked the Secrets of a Southeast Asian Scam Compound. Then He Had to Get Out Alive
A source trapped inside an industrial-scale scamming operation contacted me, determined to expose his captors' crimes--and then escape. It was a perfect June evening in New York when I received my first email from the source who would ask me to call him Red Bull. He was writing from hell, 8,000 miles away. A summer shower had left a rainbow over my Brooklyn neighborhood, and my two children were playing in a kiddie pool on the roof of our apartment building. Now the sun was setting, while I--in typical 21st-century parenting fashion, forgive me--compulsively scrolled through every app on my phone. The message had no subject line and came from an address on the encrypted email service Proton Mail: "vaultwhistle@proton.me." I'm currently working inside a major crypto romance scam operation based in the Golden Triangle," it began. "I am a computer engineer being forced to work here under a contract." "I've collected internal evidence of how the scam works--step by step," the message ...
- North America > United States > New York (0.24)
- Asia > Laos (0.05)
- Asia > Southeast Asia (0.04)
- (14 more...)
Evaluation of retrieval-based QA on QUEST-LOFT
Scales, Nathan, Schärli, Nathanael, Bousquet, Olivier
Despite the popularity of retrieval-augmented generation (RAG) as a solution for grounded QA in both academia and industry, current RAG methods struggle with questions where the necessary information is distributed across many documents or where retrieval needs to be combined with complex reasoning. Recently, the LOFT study has shown that this limitation also applies to approaches based on long-context language models, with the QUEST benchmark exhibiting particularly large headroom. In this paper, we provide an in-depth analysis of the factors contributing to the poor performance on QUEST-LOFT, publish updated numbers based on a thorough human evaluation, and demonstrate that RAG can be optimized to significantly outperform long-context approaches when combined with a structured output format containing reasoning and evidence, optionally followed by answer re-verification.
- Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.05)
- Europe > United Kingdom > England (0.04)
- Asia > India > Tamil Nadu (0.04)
- Asia > India > Jammu and Kashmir (0.04)
SANSKRITI: A Comprehensive Benchmark for Evaluating Language Models' Knowledge of Indian Culture
Maji, Arijit, Kumar, Raghvendra, Ghosh, Akash, Anushka, null, Saha, Sriparna
Language Models (LMs) are indispensable tools shaping modern workflows, but their global effectiveness depends on understanding local socio-cultural contexts. To address this, we introduce SANSKRITI, a benchmark designed to evaluate language models' comprehension of India's rich cultural diversity. Comprising 21,853 meticulously curated question-answer pairs spanning 28 states and 8 union territories, SANSKRITI is the largest dataset for testing Indian cultural knowledge. It covers sixteen key attributes of Indian culture: rituals and ceremonies, history, tourism, cuisine, dance and music, costume, language, art, festivals, religion, medicine, transport, sports, nightlife, and personalities, providing a comprehensive representation of India's cultural tapestry. We evaluate SANSKRITI on leading Large Language Models (LLMs), Indic Language Models (ILMs), and Small Language Models (SLMs), revealing significant disparities in their ability to handle culturally nuanced queries, with many models struggling in region-specific contexts. By offering an extensive, culturally rich, and diverse dataset, SANSKRITI sets a new standard for assessing and improving the cultural understanding of LMs.
Assessing Web Search Credibility and Response Groundedness in Chat Assistants
Vykopal, Ivan, Pikuliak, Matúš, Ostermann, Simon, Šimko, Marián
Chat assistants increasingly integrate web search functionality, enabling them to retrieve and cite external sources. While this promises more reliable answers, it also raises the risk of amplifying misinformation from low-credibility sources. In this paper, we introduce a novel methodology for evaluating assistants' web search behavior, focusing on source credibility and the groundedness of responses with respect to cited sources. Using 100 claims across five misinformation-prone topics, we assess GPT-4o, GPT-5, Perplexity, and Qwen Chat. Our findings reveal differences between the assistants, with Perplexity achieving the highest source credibility, whereas GPT-4o exhibits elevated citation of non-credibility sources on sensitive topics. This work provides the first systematic comparison of commonly used chat assistants for fact-checking behavior, offering a foundation for evaluating AI systems in high-stakes information environments.
- Media > News (1.00)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- (7 more...)
- Information Technology > Information Management > Search (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
CorIL: Towards Enriching Indian Language to Indian Language Parallel Corpora and Machine Translation Systems
Bhattacharjee, Soham, Roy, Mukund K, Poojary, Yathish, Dave, Bhargav, Raj, Mihir, Mujadia, Vandan, Gain, Baban, Mishra, Pruthwik, Ahsan, Arafat, Krishnamurthy, Parameswari, Rao, Ashwath, Josan, Gurpreet Singh, Dubey, Preeti, Kak, Aadil Amin, Kulkarni, Anna Rao, VG, Narendra, Arora, Sunita, Balbantray, Rakesh, Majumdar, Prasenjit, Arora, Karunesh K, Ekbal, Asif, Sharma, Dipti Mishra
India's linguistic landscape is one of the most diverse in the world, comprising over 120 major languages and approximately 1,600 additional languages, with 22 officially recognized as scheduled languages in the Indian Constitution. Despite recent progress in multilingual neural machine translation (NMT), high-quality parallel corpora for Indian languages remain scarce, especially across varied domains. In this paper, we introduce a large-scale, high-quality annotated parallel corpus covering 11 of these languages : English, Telugu, Hindi, Punjabi, Odia, Kashmiri, Sindhi, Dogri, Kannada, Urdu, and Gujarati comprising a total of 772,000 bi-text sentence pairs. The dataset is carefully curated and systematically categorized into three key domains: Government, Health, and General, to enable domain-aware machine translation research and facilitate effective domain adaptation. To demonstrate the utility of CorIL and establish strong benchmarks for future research, we fine-tune and evaluate several state-of-the-art NMT models, including IndicTrans2, NLLB, and BhashaVerse. Our analysis reveals important performance trends and highlights the corpus's value in probing model capabilities. For instance, the results show distinct performance patterns based on language script, with massively multilingual models showing an advantage on Perso-Arabic scripts (Urdu, Sindhi) while other models excel on Indic scripts. This paper provides a detailed domain-wise performance analysis, offering insights into domain sensitivity and cross-script transfer learning. By publicly releasing CorIL, we aim to significantly improve the availability of high-quality training data for Indian languages and provide a valuable resource for the machine translation research community.
- Asia > Pakistan (0.04)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Europe > Iceland > Capital Region > Reykjavik (0.04)
- (18 more...)
- Government > Regional Government > Asia Government > India Government (0.54)
- Law > Intellectual Property & Technology Law (0.46)
- Health & Medicine > Consumer Health (0.46)
FairI Tales: Evaluation of Fairness in Indian Contexts with a Focus on Bias and Stereotypes
Nawale, Janki Atul, Khan, Mohammed Safi Ur Rahman, D, Janani, Gupta, Mansi, Pruthi, Danish, Khapra, Mitesh M.
Existing studies on fairness are largely Western-focused, making them inadequate for culturally diverse countries such as India. To address this gap, we introduce INDIC-BIAS, a comprehensive India-centric benchmark designed to evaluate fairness of LLMs across 85 identity groups encompassing diverse castes, religions, regions, and tribes. We first consult domain experts to curate over 1,800 socio-cultural topics spanning behaviors and situations, where biases and stereotypes are likely to emerge. Grounded in these topics, we generate and manually validate 20,000 real-world scenario templates to probe LLMs for fairness. We structure these templates into three evaluation tasks: plausibility, judgment, and generation. Our evaluation of 14 popular LLMs on these tasks reveals strong negative biases against marginalized identities, with models frequently reinforcing common stereotypes. Additionally, we find that models struggle to mitigate bias even when explicitly asked to rationalize their decision. Our evaluation provides evidence of both allocative and representational harms that current LLMs could cause towards Indian identities, calling for a more cautious usage in practical applications. We release INDIC-BIAS as an open-source benchmark to advance research on benchmarking and mitigating biases and stereotypes in the Indian context.
- Asia > India > Bihar (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Asia > India > Uttar Pradesh (0.04)
- (37 more...)
TITAN: Query-Token based Domain Adaptive Adversarial Learning
Ashraf, Tajamul, Bashir, Janibul
We focus on the source-free domain adaptive object detection (SF-DAOD) problem when source data is unavailable during adaptation and the model must adapt to an unlabeled target domain. The majority of approaches for the problem employ a self-supervised approach using a student-teacher (ST) framework where pseudo-labels are generated via a source-pretrained model for further fine-tuning. We observe that the performance of a student model often degrades drastically, due to the collapse of the teacher model, primarily caused by high noise in pseudo-labels, resulting from domain bias, discrepancies, and a significant domain shift across domains. To obtain reliable pseudo-labels, we propose a Target-based Iterative Query-Token Adversarial Network (TITAN), which separates the target images into two subsets: those similar to the source (easy) and those dissimilar (hard). We propose a strategy to estimate variance to partition the target domain. This approach leverages the insight that higher detection variances correspond to higher recall and greater similarity to the source domain. Also, we incorporate query-token-based adversarial modules into a student-teacher baseline framework to reduce the domain gaps between two feature representations. Experiments conducted on four natural imaging datasets and two challenging medical datasets have substantiated the superior performance of TITAN compared to existing state-of-the-art (SOTA) methodologies. We report an mAP improvement of +22.7, +22.2, +21.1, and +3.7 percent over the current SOTA on C2F, C2B, S2C, and K2C benchmarks, respectively.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.86)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States (0.04)
- (3 more...)
- Health & Medicine > Diagnostic Medicine > Imaging (0.68)
- Education (0.68)
- Health & Medicine > Therapeutic Area > Oncology (0.47)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
How Does Response Length Affect Long-Form Factuality
Zhao, James Xu, Liu, Jimmy Z. J., Hooi, Bryan, Ng, See-Kiong
Large language models (LLMs) are widely used for long-form text generation. However, factual errors in the responses would undermine their reliability. Despite growing attention to LLM factuality, the effect of response length on factuality remains underexplored. In this work, we systematically investigate this relationship by first introducing an automatic and bi-level long-form factuality evaluation framework, which achieves high agreement with human annotations while being cost-effective. Using this framework, we conduct controlled experiments and find that longer responses exhibit lower factual precision, confirming the presence of length bias. To explain this phenomenon, we empirically examine three hypotheses: error propagation, long context, and facts exhaustion. Our results reveal that facts exhaustion, where the model gradually exhausts more reliable knowledge, is the primary cause of factual degradation, rather than the other two hypotheses.
- Asia > India > Jammu and Kashmir (0.05)
- Asia > India > Andhra Pradesh (0.05)
- Asia > Singapore (0.04)
- (13 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Law (1.00)
- (3 more...)
Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning
Das, Swadhin, Gupta, Saarthak, Kumar, and Kamal, Sharma, Raksha
Remote Sensing Image Captioning (RSIC) is the process of generating meaningful descriptions from remote sensing images. Recently, it has gained significant attention, with encoder-decoder models serving as the backbone for generating meaningful captions. The encoder extracts essential visual features from the input image, transforming them into a compact representation, while the decoder utilizes this representation to generate coherent textual descriptions. Recently, transformer-based models have gained significant popularity due to their ability to capture long-range dependencies and contextual information. The decoder has been well explored for text generation, whereas the encoder remains relatively unexplored. However, optimizing the encoder is crucial as it directly influences the richness of extracted features, which in turn affects the quality of generated captions. To address this gap, we systematically evaluate twelve different convolutional neural network (CNN) architectures within a transformer-based encoder framework to assess their effectiveness in RSIC. The evaluation consists of two stages: first, a numerical analysis categorizes CNNs into different clusters, based on their performance. The best performing CNNs are then subjected to human evaluation from a human-centric perspective by a human annotator. Additionally, we analyze the impact of different search strategies, namely greedy search and beam search, to ensure the best caption. The results highlight the critical role of encoder selection in improving captioning performance, demonstrating that specific CNN architectures significantly enhance the quality of generated descriptions for remote sensing images. Introduction With the advancement of remote sensing technologies and machine learning-based methods, the demand for Remote Sensing Image Captioning (RSIC) [1, 2] is growing rapidly. It plays a crucial role in various fields, including environmental monitoring, urban planning, and disaster management, by providing automated textual descriptions of satellite images.
- Asia > India > Uttarakhand > Roorkee (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (3 more...)
A Breadth-First Catalog of Text Processing, Speech Processing and Multimodal Research in South Asian Languages
We review the recent literature (January 2022- October 2024) in South Asian languages on text-based language processing, multimodal models, and speech processing, and provide a spotlight analysis focused on 21 low-resource South Asian languages, namely Saraiki, Assamese, Balochi, Bhojpuri, Bodo, Burmese, Chhattisgarhi, Dhivehi, Gujarati, Kannada, Kashmiri, Konkani, Khasi, Malayalam, Meitei, Nepali, Odia, Pashto, Rajasthani, Sindhi, and Telugu. We identify trends, challenges, and future research directions, using a step-wise approach that incorporates relevance classification and clustering based on large language models (LLMs). Our goal is to provide a breadth-first overview of the recent developments in South Asian language technologies to NLP researchers interested in working with South Asian languages.
- Research Report (1.00)
- Overview (1.00)